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PREFACE 


This report describes part of a comprehensive and continuing pro- 
gram of research concerned with advancing the state-of-the-art in remote 
sensing of the environment from aircraft and satellites. The research 
is being carried out for NASA's Lyndon B. Johnson Space Center, Houston, 
Texas, by the Environmental Research Institute of Michigan (ERIM) . The 
basic objective of this multidisciplinary program is to develop remote 
sensing as a practical tool to provide the planner and decision-maker with 
extensive information quickly and economically. 

Timely information obtained by remote sensing can be important to 
such people as the farmer, the city planner, the conservationist, and 
others concerned with problems such as crop yield and disease, urban land 
studies and development, water pollution, and forest management. The 
scope of our program includes: 

1. extending the understanding of basic processes. 

2. discovering new applications, developing advanced remote- 
sensing systems, and improving automatic data processing to 
extract information in a useful form. 


3. assisting in data collection, processing, analysis, and ground- 
truth verification. 

The research described herein was performed under NASA Contract 
NAS9-14123, Task 12, and covers the period from May 15, 1975 through 
May 14, 1976. Andrew Potter (TF3) was the NASA Contract Technical Moni- 
tor. The program was directed by Richard R. Legault, Vice-President of 
ERIM and Head of the Infrared and Optics Division, Jon D. Erickson, Head 
of the Information & Analysis Department, and Richard F. Nalepka, Prin- 
cipal Investigator and Head of the Multispectral Analysis Section. 

The authors acknowledge the excellent programming support of 
A. Pentland. The authors recognize and appreciate the illumination pro- 
vided by H. Horwitz and J. Colwell of ERIM, and M. Trichel and R. Heydorn 
of Johnson Space Center. In addition, numerous other individuals con- 


tributed to the general concepts of this report. 
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SUMMARY 

The purpose of this work is to develop computer techniques for 
assisting an Analyst-Interpreter (AI) in the task of training field 
identification. The result of the work has been to develop an integra- 
ted man-machine approach to the problem of local proportion estimation 
in large scale agricultural remote sensing systems, of which LACIE 
(Large Area Crop Inventory Experiment) is an example. The approach 
builds on the current LACIE system structure. 

Local proportion estimation has two major aspects; the organiza- 
tion of the data of the sample segment in such a way as to simplify 
the task of AI identification of the training fields and to integrate 
the AI designations more closely into the proportion estimation pro- 
cess; and the actual process of designating training fields whether by 
AI, by computer, or by a joint effort of both. A partial system for 
performing the second task is described, and the conceptual basis of 
the approach is explored. A complete system for performing the first 
of the above tasks is described; the system has been implemented using 
ground truth in lieu of AI designations; and a few examples 6f propor- 
tion estimates are given. 
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INTRODUCTION 

The Classification and Mensuration Subsystem of LACIE covers three 
principle functional areas. 

1. Training Field Identification 

2. Training Statistics 

3. Proportion Estimation. 

The training field Identification function in LACIE is carried 
out by Analyst-Interpreters (AI's). Training statistics may be 
extracted from the identified training fields and be used locally 
(in the same sample segment) for proportion estimation in the LACIE II 
processor; or training statistics may be extracted from the identified 
training fields, be modified in some way, and be used non- locally (in 
some other sample segment) for proportion estimation. 

The work accomplished under this task was directed to the objec- 
tive of assisting the AI's in performance of the training field identi- 
fication function, either by helping AI's identify wheat training 
fields, or by direct computer identification of wheat training fields . 
However, a slight generalization leads to the statement that the objec- 
tive is to create an Al-computer system for performing local proportion 
estimation. 

The general approach chosen for this task was to examine the 
functions which must be performed to identify wheat fields at a level 
of generality that would be valid whether either a man or a computing 
machine were carrying out the task; to identify the functions which 
might best be carried out by the computing machine in the near or 
intermediate term future; and to attempt to implement some or all of 
these functions . 

The functions which were considered were restricted to those which 
could be carried out using data which could reasonably be obtained in an 
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operational large area crop inventory system utilizing Landsat data, 
namely: the Landsat image data; historical crop calendars, cropping 

practices, crop acreages, meteorological data; current crop calendars, 
meteorological station and meteorological satellite data; and various 
spectral and spatial features drawn from the Landsat images. The 
Landsat images as used by AI's or as processed for proportion estima- 
tion are termed image data; all other types of data will be termed 
ancillary data. 

Since there are choices available in precisely what types and 
sources of data to use, there is no unique set of functions which will 
be necessary and sufficient to identify wheat fields or to perfom an 
estimate of wheat proportion. Our approach will be to identify as 
many functions as we can and place them in a reasonable order in a 
function flow diagram. 

The observation of the behavior of the AI’s within the context 
of the LACIE system is a fruitful source for identifying specific 
functions. Other portions of the LACIE system can provide ideas as 
well. In addition, we want to be alert to the possible existence of 
useful functions not being carried on anywhere in the LACIE system. 
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FUNCTION ANALYSIS 

Within the LACIE system, or a follow-on agricultural remote 
sensing system, we have thought of the AI as providing starting poiits 
for both local and non-local recognition. However, there has been no 
notable success in the non-local recognition even though rather signi- 
ficant success has been attained in local recognition (when particular 
useful combinations of passes were available). At this point in time, 
we perceive two broad reasons for failure of nonr^local recognition. 

One of these is the influence of external effects such as viewing angle, 
atmosphere haze, and water vapor. It has been well demonstrated now 
that failing to correct these effects will result in essentially random 
proportion estimations [14]. These effects can be corrected b> a vari- 
ety of means being pursued throughout the SR&T effort under the title of 
signature extension. 

Ihe second broad reason for failure of non-local recognition can 
be seen in terms of a sampling problem. The individual fields or regions 
within a sample segment can be modeled as being the result of some ran- 
dom draw from a large region called a partition. Within this large re- 
gion the variability of signatures may be large, but the statistics of the 
variability are stationary. The collection of fields actually occurring 
in any particular sample segment is thus a small sample from a broad 
distribution. The chances are small that the sample statistics of the 
training fields picked by the AI in the training (local) sample segment 
will match the sample statistics of the fields in the recognition seg- 
ment. 

One might ask why it is that local recognition seems to work rather 
well. Surely the same sampling differences occurs within sample segments 
as between them; why is it not equally true that the chances are small 
that the sample statistics of the training fields picked by the AI will 
match the sample statistics of the remainder of the fields in the local 
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sample segment? The reason is that the AI deliberately tries to pick 
fields which are representative of the local segment. In effect, he 
analyzes the statistics of the entire segment and then picks individual 
fields which are representative of the entire segment. In order to 
obtain the same success in non-local recognition, the AI would have to 
see the non-local segment and pick fields from the training segment 
which were representative of the non-local segment. In most cases 
such fields would by chance not exist. Even if they did, the AI could 
not examine every recognition segment. 

Evidently, the key is to select fields which are representative of 
the entire partition. In order to do this, it vjill be necessary to 
find training fields in many sample segments and to correct external 
effects among and between these segments in order to establish a com- 
posite training data base for the partition. 

The above paragraphs have been directed to the subject of non- 
local recognition or signature extension. Let us return now to the 
task of local identification of crops. 

If the AI must be involved in every local segment, we at least 
would like to be able to relieve him of some of the burden through com- 
puter techniques. We see the goal of a gradual reduction in the AI 
work load at each site until finally he is in the role of monitoring 
and checking computer results. We are encouraged that this goal may be 
a practical one by the early results of the delta classifier [12] • 

With the above paragraphs as an introduction, we now briefly 
review the functions which the AI must perform in order to identify 
field crops, and indicate the corresponding computer support functions 
which we have developed. 

Table 1 lists the AI functions which we perceive at this time, 

The associated computer support functions are also shown. Notice that 
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TABLE 1, LIST OF CONCEPTUALIZED AI FUNCTIONS 
AND RELATED COMPUTER SUPPORT FUNCTIONS 


AI FUNCTIONS 

Locate fields. 

Fix fields. 

Estimate field 
characteristics. 

Tentatively identify 
fields. 

Review field character^ 
istics and select a 
representative set of 
fields. 

Final field identification. 


ASSOCIATED COMPUTER 
SUPPORT FUNCTIONS 


Isolate fields and flag all 
pixels belonging to the 
particular fields. Present 
to AI. 

Compute spectral- temporal 
statistics of each field. 

Suggest field identification 
to AI for his verification. 

Cluster fields into higher 
groups. Select a repre- 
sentative set of fields. 


Suggest field identification 
to AI for his spot checking. 
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one computer support function, isolate and flag field pixels, is shown 
opposite two separate AI functions. In general, there is no exact corre- 
spondence between the beginning and end of AI steps and computer steps. 

It is not necessary to implement all computer support functions 
in order to benefit from some of them. However, each different partial 
implementation requires a different interface with the rest of CAMS. 

A progression of examples is shown in Figures 1(a) through 1(e). We 
show this entire set of figures to demonstrate the gradual changes in 
system function which are possible without premature commitment to the 
final stage. 

Figure 1(a) shows the current configuration with emphasis on the 
AI. The ancillary data in the form of crop calendars and cropping prac- 
tices comes to the AI along with film products from the Landsat imagery. 

In Figure 1(b) an additional data product is delivered to the AI, 
namely a transparency in which the flagged field center pixels are printed 
and the non-f lagged boundary pixels are suppressed. Using a lined over- 
lay, the AI can designate the center of any group of pixels by approxi- 
mate line and point. The computer then matches up the line and point with 
the mean line and point of a field. These actions replace the drawing 
of fields and the extraction of vertices. The flagging of field pixels 
can be accomplished based on all of the passes available to that date, 
i.e., field isolation can be based on multitemporal data. 

In order to define fields, the computer will be caiculating field 
statistics anyway, so these statistics can be saved and used for train- 
ing the LACIE processor, rather than recalculating statistics from the 
training fields. This option is shown in Figure 1(c). 

In Figure 1(d), the AI is provided with an additional source of 
information, namely a list of groupings of fields according to their 
multitemporal/multispectral characteristics. This list supplements the 
AI’s own visual characterization of the entire scene and assists him in 
choosing representative fields. For example, after choosing a set of 
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FIGURE 1(a) . FUNCTION FLOW, CURRENT CAMS SYSTEM 
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FIGURE 1(b). FUNCTION FLOW WITH PARTIAL COMPUTER 
SUPPORT TO ISOLATE AND FLAG FIELD PIXELS 
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FIGURE 1(c). FUNCTION FLOW WITH PARTIAL COMPUTER 
SUPPORT TO ISOLATE AND FLAG FIELD 
PIXELS AND PRECALCULATE STATISTICS 
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FIGURE .1(d) . FUNCTION FLOW WITH PARTIAL COMPUTER 
SUPPORT TO ISOLATE AND FLAG FIELD PIXELS, 
PRECALCULATE STATISTICS, AIJD GROUP FIELDS 
ACCORDING TO THEIR STATISTICS. 
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FIGURE 1(e). FUNCTION FLOW WITH 
COMPUTER SUPPORT TO IDENTIFY FIELDS 
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training fields, the AI can go through the listing and determine whether 
he has chosen at least one from each group. The AI could also identify 
an entire group of fields as being wheat or non-wheat. If all the 
groups were thus Identified, the proportion estimation could be accom- 
plished by tallying the various flagged pixels. 

In Figure 1(e), another new computer support function is added, 
namely the (tentative or final) identification of the fields. In order 
to carry out this function some external information is required, and 
so we show ancillary data going to both the computer and to the AI. 
Actually, the tentative identification of fields is a major increase in 
complexity, a not surprising fact, since it is the prime function of the 
AI in the LACIE system. In order to accomplish this function, the 
computer must, at least implicitly, correct the data for external effects 
It must then classify the fields against some multitemporal/multispec- 
tral model of crops, whether this model is a data hank or an analytic 
model. It is pretty clear that crop calendar information will need to 
be in the model to correct for variations in acquisition time. It 
appears that the spectral features chosen to enter the model might be 
similar to the ones the AI uses, namely color and brightness, rather 
than individual channel responses. 

In the above discussion, we see two main classes of computer 
support function; those which require no ancillary data but which 
organize the sample segment for the AI in varying degree, and those 
which assist the AI in identifying fields. The purpose of the first 
class is either to relieve the AI of some cumbersome detail to make it 
possible for him to work more sample segments, or to assist him in 
making certain that his field selections are representative. The ulti- 
mate purpose of the Second class is to replace most of the AI’s in the 
system. Both of these classes of function are discussed in more detail 
in later sections . 
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SUPPORTING CAPABILITIES 

4.1 FIELD ISOLATION AND ANALYSIS 

In the earlier Section 3 several functions were identified in a 
hierarchy of possible assistance to the AI. These included the iso- 
lation of fields, the flagging of field center pixels so that they are 
associated together and the labeling of field groups of pixels so that 
the AI can point to them in some way; these are all field isolation 
steps. Additional steps are the calculation of the signatures of the 
groups of pixels associated into fields, the collection of fields into 
spectrally similar groups and the selection of representative fields 
from those groups. 

Several different methods of field isolation were tried and have 

* 

been mentioned in previous quarterly reports. These are the method 
of boundary elimination by gradients, the method of boundary elimina- 
tion by nine— point rule (Bayes 9 in this case) and the method of field 
building by "line and point clustering" or "spectral-spatial clustering." 
In all cases these methods were considered because their implementation 
involved modest changes to existing computer programs. In addition, 
some consideration has been given to the field building algorithm of 
Gupta and Wintz [ l], but we have not tried it out, primarily because it . 
would have to be programmed from scratch for our computer systems and 
we appear to have an acceptable alternate in hand. 

The results of the boundary elimination by nine-point rules appear 
qualitatively to be inferior to the results from spectral-spatial clus- 
tering or gradient. Of these latter two, spectral-spatial clustering has 
the advantage that it automatically incorporates the calculation of the 
field statistics (with certain reservations which will be explained 

* Technical Progress Report, 15 August 1975-14 November 1975, Task 12, 
Contract NAS9-14123, Report No, 109600-41-L, December 1975. 
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later) and it automatically designates the field in a way that is 
potentially useful to the AI, i.e., by the location of the field 
center. An additional potential advantage is that any clustering 
program could be modified to do spectral-spatial clustering relatively 
easily, including ISOCLASS or MLE. Therefore we will proceed to 
briefly describe the spectral spatial clustering algorithm and present 
typical results. 

The basic idea of clustering algorithms is to group together pixels 
which are spectrally similar in some sense. The idea of spectral-spatial 
clustering is to group together pixels which are spectrally and 
spatially similar in some sense. Normally, each pixel is represented 
by a vector of spectral intensity values, i.e., Landsat MSS counts. 

In spectral-spatial clustering these ve'Ctors are augmented by the 
addition of two more components, one for the line number and one for 
the point number within the line. 

The clustering algorithm normally includes or rejects 
Individual pixels from within the clusters based on the probability 
distribution of the points already included in the cluster. The 
sample distribution is assumed to be normal with the mean and 
covariance being the sample mean and the sample covariance of the 
points already in the cluster. The addition of line and point 
channels causes each cluster to be localized, to have a mean line 
and point value and a line and point variance. At the end of the run 
the cluster statistics are printed. These include the means of line 
and point which serve well as indicators of the position of the cluster. 

In order to avoid a confusion in terms we have followed the 
example of Gupta and Wintz and have adopted the name "blob” for line 
and point clusters. This is to distinguish between computer output 
groups of pixels and true fields, or fields designated by the Al’s. 

In general, spectral-spatial clustering will break "large’' fields into 
more than one blob. 
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Some additional features of the spectral-spatial clustering 
program are: 

a. Provision is made to place a relative weight on the 
importance of the spectral channels versus the line and point channels. 
For multitemporal data the weights are adjusted to place less relative 
importance on the spectral channels. 

b. The raw line and point coordinates are rotated before 
clustering to produce coordinates which are lined up with a normal 
map grid. In order to use the means to locate points in lACIE sample 
segments they must be rotated back again. 

The reason the rotation is appropriate is that in order to 

save time the clustering algorithm uses only line and point variance 
rather than covariance. The overall effect without the rotation would 

be that the algorithm would be biased toward producing blobs which are 
ellipsoidal-shaped but not lined up with the North-South world map. 

After rotation, the algorithm is biased towards producing blobs which 
are lined up with a North-South map. 

c. A recent modification is that in addition to being rotated, 

the line and point coordinates use the criteria + y*^ 

rather than + y2 , The effect of this criterion, in the 

absence of competition from other blobs, is to create a super-ellipsoid 
shaped blob, which seems to be more field like. In practice neither 
the rotation nor the super-ellipsoid criterion seem to affect the 
shape of blobs very much. Instead, blobs crowd together to fill out 
into corners and the actual shape seems to be determined by competition 
between neighboring blobs. 

d. As mentioned, blobs tend to crowd together to completely 
fill the sample segment . However , one purpose of this entire exercise 

is to identify field center pixels. Therefore, the pixels on the 
boundary between blobs are flagged as boundary pixels. The remaining 
pixels in each blob constitute "stripped blobs". The stripping of 
blobs does not significantly alter their mean line and point cbordinates 
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Figures 2(a), (b) , (c) , and (d) show portions of LACIE sample seg- 
ments which have been subjected to multitemporal spectral-spatial clus- 
tering. Figure 2(a) shows the Morton Intensive Test Site (ITS). The 

four dates used are: 

23 October 1973 
9 May 1974 
27 May 1974 
14 June 1974. 

In Figure 2(a), the ground-truth field lines (and field numbers), 
and an encompassing rectangle are overlaid on the blob presentation. 

The field center pixels are left blank while the stripped pixels are 
shown as asterisks. Some fields are missing entirely, since they were 
formed into small enough blobs that they were stripped eiitirely away by 
the stripping operation. This is no particular drawback since such small 
or ragged fields aren't likely to make very good candidates for training 
fields anyway. 

A visual survey of Figure 2(a) supports the following statements; 

a. Most 160 acre fields provide a potential training field (field 
10 and most of its neighbors are 160 acre fields) . 

b. A majority of 80 acre fields provide a potential training field 
(field 32 and some of its neighbors are 80 acre fields) . 

c. A substantial minority of 40 acre fields provide a potential 
training field. 

Note that one line of the data is a bad line, i.e., a line filled 
with noisy points in one or more channels. Spectrally each point is 
likely to be extremely different from its neighbors; hence, the line 
tends to produce a large number of very small blobs confined to the line. 
The process of stripping then removes the bad line, and the one above 
and the one below. The line in question is line 248 which passes through 
the circular field number 144, at the left edge of the figure. 

Some large fields are converted into two blobs and so would be used 
as two separate training fields having the same ID. This effect is 


17 



V WN.LOMT MUN t ABOWATOItiCft. TmC UNIVIAAlTV OT MtCHlOAN 




I 



' '-IGINAL PAGR is 
t*’ POOR OriAr 


19 











FIGURE 2(d). N. STEVENS SRS SITE MULTITEMPORAL SPECTRAL SPATIAL BLOB MAP 

21 


original PAGF 19 

0 ^ POOR QoS 








FORMERLY WILLOW RUN LABORATORIES. THE UNIVERSITY OF MICHIGAN 

noticeable in the fields which are large in East-West extent. Num- 
bers 0A9, 051, and 073. Other large fields did not get split up, not- 
ably North-South oriented ones such as Number 126, the composite of 
009 and 025, and the composite of 010 and 026. The fact that the BLOB 
program is a single-pass clustering program coupled with the slight 
angle of the scan pattern relative to East-West running boundaries, pro- 
vides a satisfactory explanation of this behavior, and it can be cor- 
rected if need be. However, the fields which were split are 320-acre 
fields and there is no point in using training fields that large anyway, 
since the points will be largely redundant. 

In several cases throughout the scene blobs run across field 
boundaries. We have identified eleven such cases and the field pairs 
and their ground truth ID’s are listed below in Table 2, In 10 out 
of 13 cases the ID’s match. (400, 402 and 404 are all wheat varieties.) 
In two of the cases the infringement of the blob across field boundaries 
amounts to only a few pixels, but is still significant. One case (013 
versus 030) is completely unexplained. 

Figures 2(b)-2(d) are blob maps for other segments. In these maps 
the blobs themselves are shown as asterisks. 

Thus, we have here the basis for a computer AI- interaction such 
as was shown above in Figure 1(b). The AI can identify the blob by 
its number and associate a crop type label with it. The computer can 
then use all the pixels flagged with that blob number for training. 

Alternatively , the AI can indicate the approximate line-and-point 
number of any particular blob and label it as to crop type. Then a 
comfjuter program can be written to find the blob number whose mean is 
closest to that approximate line and point number. 

A supplementary program has been written to cluster the blob spec- 
tral means and produce a listing of the groups of blobs. Within a group 
they are organized by the number of pixels in a stripped blob, the larg- 
est being last. The AI can examine this list to see whether, according 
to the computer, he has included representatives of all the spectrally 
distrinct fields in the scene. This input would allow the implementa- 
tion of Figure 1(c), discussed earlier in Section 3. 
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TABLE 2. EXAMPLES OF BLOBS OVERLAPPING 
BOUNDARIES BETWEEN PAIRS OF FIELDS 


Field Name 

Field ID 

002 

700 

019 

700 

009 

402 

025 

404 

010 

700 

026 

700 

013 

400 

030 

700 

039 

404 

040 

400 

074 

700 

075 

700 

073 

700 

093 

400 

093 

400 

094 

404 

094 

404 

118 

700 

118 

700 

116 

700 

148 

500 

149 

500 

173 

700 

193 

700 

185 

400 

186 

400 
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Table 3 shows the groupings of blobs for the Morton ITS. The 
number of groups, hereafter called Blob clusters or B clusters, was 
limited to a maximvim of 30 which was thought to be in line with the 
number of possible spectral classes in a scene. 

Under each B cluster are shown the blob numbers associated with 
that cluster, the number of blob center pixels (unstripped pixels) in 
each blob and, where the number of unstripped pixels was greater than 
zero, the line and point means for the blob. Blobs are ranked from 
smallest to largest. The number of blobs in a B cluster varies. B 
clusters 1 and 2 contain 52 and 132 blobs respectively. B clusters 
27 through 30 contain 1 blob each. The number of unstripped pixels 
in a blob varies from 0 to 235. 

Simulating our proposed AI methodology, we took the ground informa- 
tion, our multitemporal spectral-spatial cluster map, and the B cluster 
table relevant to Morton and identified B clusters 4 and 10 as all 
Wheat clusters; B clusters 2, 3, 6, 7, 8, 9, and 19 as containing only 
f'Other”; 4 ambiguous B clusters, numbers 1, 5, 11, and 18, containing 
a mix of Wheat and Other; and the rest could not be identified from the 
existing ground truth. (Appendix I contains a listing of field ID's.) 

There was at least one instance when some supplementary informa- 
tion was required to explain an apparent inconsistency in the line and 
point mean of one of the blobs and the very existence of another. In 
Table 3 near the end of the listing for B cluster 4 you will see blob 
number 220. Its line and point means fall on a boundary between 
fields 33 and 52. It was revealed that the blob sitting in field 32 was 
a part of the blob sitting in field 52. The connection between them is 
visible where it occupies field 31 and is lost at the intersection of 
fields 31, 33, 51, and 52. Fields 31, 32, and 52 are identified as 
wheat and field 33 as summer fallow in the ground information for the 
Morton ITS. 




TABLE 3. MORTON ITS BLOB GROUPINGS 
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TABLE 3. MORTON ITS BLOB GROUPINGS (Continued) 
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TABLE 3. MORTON ITS BLOB GROUPINGS (Continued) 
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TABLE 3. MORTON ITS BLOB GROUPINGS (Continued) 
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TABLE 3, MORTON ITS BLOB GROUPINGS (Continued) 
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From another point of view, if one were attempting to find the blob 
number associated with field number 32, one would search the table in 
vain to find a pair of line and point means which fall in field 32. 

Figures 2(b), (c), and (d) present an assortment of blob maps in 
which the blob centers are filled in and edges blanked out. At the top 
of each encompassing rectangle we show the site name, a code for the 
dates of coverage used in generating these maps and two digits showing 
the year of coverage. The data code is described below: 

Letter Code Time Frame of Data Collection 

A 4 October 1973 

B 20-23 October 1973 

C 18-20 April 1974 

D 6-9 May 1974 

E 24-27 May 1974 

F 12-14 June 1974 

For the Lane SRS site time periods A, C, and E were used, for 
McPherson, B, C, D, and E, and for North Stevens, E, and F. Clustering 
was limited to Landsat Bands 5 and 6 from each utilized time period. 

This was done to limit the amount of time spent in the clustering process. 

A number of observations may be made upon viewing the maps: 

1. Data sites vary considerably in the spatial texture of their 
cluster patterns. This spatial texture is seen to be the most 
orderly (blobs of goodly size, regularly distributed) in the 

Lane site and the least orderly in the McPherson site. Varia- 
tions in orderliness within a map may also be seen. 

2. Road patterns (as field boundaries) can be easily spotted in 
the Lane and North Stevens maps. One might not know where to 
drive looking at the McPherson map. Also, more field edges are 
aligned in the top of map to bottom of map direction (rather 
than the typical Landsat slant direction) in the McPherson site 
than in the other two sites. 
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3. Blob size is in most cases consistent with field size. If 
the blobs were allowed to grow any further, the spatial 
information provided by field edges could be lost. 

4. Bad data lines were treated specially. This was necessary to 
reduce their impact on the maps. The technique used to mark 
the boundaries of fields would have made a triplet of bad lines 
from each single one. Now, a single bad line shows up as a 
single line of blanks in the maps. 

5. Data consisting of fill bits (used to occupy the data space 
beyond the edge of a Landsat frame) is flagged and shows as a 
region of blanks (see the Lane map). 

Going beyond observation, we then attempted, for all 4 sites, 
to establish bounds on the probable wheat content for each site. Our 
procedure follows: 

Step 1 . Using the tables of clustered blobs (B clusters) and 
ground information from a site, establish the identity of as many of 
the blobs as possible starting with the largest in any B cluster. 

Step 2 . Establish three classes of B clusters — known and unique 
(either all Wheat or all Other), known and ambiguous (some LJheat blobs 
and some Other blobs) and unknown — based on Step 1. 

Step 3 . Tally the total number of pixels in a .site and the num- 
ber of pixels in known and unique B clusters. 

Step 4 . Establish a lower wheat bound by dividing the number of 
pixels in known and unique B clusters assigned to wheat by the total 
number of pixels in the site. 

Step 5 . Establish an upper wheat bound by dividing the total 
number of pixels in a scene minus the number of pixels in known and 
unique B clusters assigned to other by the total number of pixels in 
a scene. 



34 



2pM 


FORMERLY WILLOW RUN LABORATORIES. THE UNIVERSITY OF MICHIGAN 


^tep 6 . Compute a single point estimate of wheat for a site based 
on a division of the number of pixels in known and unique clusters 
assigned to wheat by the sum of those pixels and the number of pixels 
in known and unique clusters assigned to "Other". 

Step 7 . Express bounds and estimates as a percentage of the scene. 

This methodology assures us of achieving the lowest lower limit 
and the highest upper limit for wheat because in the calculation 
for the lower limit all ambiguous and unknown pixels are weighted 
with the Non-Wheat category whereas in the calculation of the upper 
limit those ambiguous and unknown pixels are weighted with the Wheat 
category . 

For Morton the site is represented by a rectangle which surrounds 
the ITS and utilizes the ITS corners as its own. For the Lane, 

McPherson and North Stevens sites statistics were computed for two 
areas each; the first area is a near perfect fit to the outline of 
the SRS site; the second is an expansion around the SRS site including 
lines 41 through 157 and point numbers 1 through 196. This second area 
represents an SRS site reprogrammed as an ITS.* Figure 3 shows the 
results of the wheat estimation. The crossbar on the line connecting 
the lower and upper bounds for wheat percentage represents the single 
point estimate from Step 6. Actual percentages of wheat for the 
Morton ITS and the three SRS sites are shown as asterisks. 

Upper and lower bounds on the wheat estimate for Morton were 
54.7% and 27.6% respectively; the single point estimate was 37.9%, 
and; the actual percentage based on ground information was 40.4%. As 
with Morton the Lane SRS site results were in close agreement with 
ground information. Upper bound, lower bound and single point estimates 
for the Lane SRS site were 30,2%, 27.2% and 28.0% respectively (actual 

* This definition corresponds to the one being used by RT&E Branch to 
establish expanded ground truth for these sites. 
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percentage 30.7). In the remaining two sites uncertainty in the wheat 
estimate increased, as indicated by the spread of the bounds of the 
estimate. Lower and upper bounds for the McPherson SRS site were 8.9% 
and 83.7% and for the North Stevens SRS site 24.4% and 84.9%. These 
quantitative measures seem to follow the qualitative estimates (spatial 
texture in particular) presented earlier. In spite of the large sepa- 
ration of the wheat estimate bounds for the McPherson and North Stevens 
SRS sites their single point estimates of 35.2% and 61.8% are reasonably 
close to their actual wheat percentages of 40,6 and 55.9. 

In three out of four cases we underestimated the amount of wheat 
at a site, coming within about 5% (in an absolute sense). In one case 
we overestimated by about 6%. Figures for actual wheat in the expanded 
SRS sites are not yet available. Hence, our estimates for the expanded 
sites may be regarded, for now, as predictions. 

In a more utilitarian vein we must ask how useful blob maps and 
associated tables would be to the AIs. Granted that the AI could 
nearly perform a classification on a data set (a ready means of tally- 
ing pixels per cluster in a site is needed, among other things) we are 
uncertain as to whether some of the materials might be more effective 
in other formats. As an example, the blob maps could be produced to 
the same scale as the Product One imagery the AIs use and aid in 
associating blobs on one with color patches on the other. In order to 
obtain some near future feedback on these questions, samples of availa- 
ble computer aids (blob maps and so on) should be provided to the AIs 
for evaluation and commentary. 

Future work should be concerned with improving the materials AIs 
can use. Under consideration are B cluster maps which would look like 
the blob maps for the three SRS sites discussed in this report except 
that instead of the blob centers being filled with asterisks they 
would be filled with the number of the B cluster into which the appro- 
priate blob was grouped. In addition, some means should be found by 
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which the AIs can tally pixels assigned to specific B clusters. AIs 
should also have the capability to recluster blobs assigned to ambigu- 
ous B clusters . Reducing the number of blobs assigned to ambiguous 
B clusters will have the effect of reducing the difference between the 
wheat upper bound and wheat lower bound as we have discussed them and 
hence reduce our uncertainty about any single point estimate of wheat 
proportion at a site. 

4.2 FIELD IDENTIFICATION 

The next step in the computer aided ID of wheat is for the 
computer to, at least tentatively, suggest to the AI the field ID. 

^2.1 steps previous to this point have involved only the data within 
the sample segment without reference to the world of other sample 
segments. To this point the only link to the outside is the AI 
himself, his experience and his training. In order for the computer 
to say anything about the crop identity it must also bring in outside 
information. Basically this Information will be some expectation of 
the characteristics of the wheat signals, i.e., a multitemporal- 
multispectral signature for wheat. 

Figure 4 is an expansion of Figure 1(e) with emphasis on computer 
functions. It shows the struture of a complete system for computer 
assisted local recognition. Certain of the elements are self explana- 
tory, such as the bad line, cloud and cloud shadow flagging. For others 
the general intent is clear but the reader, at this point , may have no 
clear idea as to where the overall concepts are coming from. In parti- 
cular, the haze correction, non-linear feature extraction, and signature 
model are concepts all related to each other. There is a nexus of con- 
cepts and ideas which need to be explained all together before any of 
them make sense alone. 

4.2.1 GENERAL DISCUSSION OF LANDSAT DATA STRUCTURE 

We will start by talking about the gross spectral structure of 
Landsat data from an agricultural region. Empirical and model ele- 
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FIGURE 4* OPERATIONS REQUIRED FOR COMPUTER ASSISTED 
IDENTIFICATION OF WHEAT TRAINING FIELDS 
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merits are combined in a heuristic idea, the Tasselled Cap. We then 
examine in more detail some of the implications for processing of 
Landsat data to extract information from agricultural scenes, and in 
particular, to identify wheat clusters. 

Figure 5(a) shows a two channel scatter diagram of Landsat 
data in an agricultural scene in Fayette Co., Illinois. The data has 
been compressed by unsupervised spectral clustering of the data points 
in all 4 Landsat MSS channels. The ellipses shown are the unit contour 
ellipses of the normal density function describing each cluster. Shown 
with each ellipse is an arbitrary number followed by the percent of the 
scene represented by the cluster generating the ellipse. The channels 
shown are CH 2 and CH 3 (Landsat Bands 5 and 6). 

Notice in Figure 5(a) the definite boundary region near the 
diagonal of the two channel presentation. All of the agricultural data 
lies to the left of this boundary. To the right of the boundary there 
is no data. The region to the left shows a definite triangle like 
shape, with two vertices on the diagonal and one near the CH 3 axis. 

Figure 5(b) shows a similar cluster plot of CH 1 vs. CH 2. 
Here, all of the data lies near a diagonal of the space again. Thus, 
we can infer that the triangle-shaped region of Fig. 5(a) is shown 
edge-on in Fig. 5(b). The three-dimensional shape of the data structure 
is that of a flattened triangle shape having little thickness. 

Figure 5(c) shows a cluster plot of CH 3 vs. CH 4. Again 
the data lies closely along a diagonal. Viewing only Figures 5(a) and 
5(c), one would conclude that, seen in the 3 space of Channels 2, 3 and 
4 the three-dimensional shape of the data structure is a flattened 
triangular shape. One then can conclude that the data structure forms 
a flattened triangular shape in 4 dimensions, and that is correct. 

If one assumes that CH 1 is highly correlated to CH 2 (as it seems 
to be, based on Figure 5(h)) and that CH 4 is highly correlated to CH 3 
(as it seems to be, based on Figure 5(c)), then the last 3 combinations of 
channels offer no particular surprises; they are in a manner of speaking 
first and second cousins of Figure 5(a) , (The fact of the high correlation 
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of CH 1 with GH 2 and of CH 4 with CH 3 has sometimes stimulated the 
comment that Landsat MSS is essentially a two-channel system; that no 
information would be lost by throwing away CH 1 and CH 4. On the con- 
trary, there is significant information of several types contained in 
the 4 channels, as we shall see as this discussion develops.) 

What is the physical reason for the data to lie in this flattened 
triangular structure? Figure 6 shows a model calculation of the reflec- 
tance of a crop canopy at two wavelengths, 0.65 nm and 0.75nm, corres- 
ponding to the centers of CH 2 and CH 3. The calculations were made for 
two soil samples, one dark, the other light, through the life of the crop. 
Notably, the triangular shape is outlined by the two-crop life development 
lines. After the crop canopy covers the soil completely, the two canopies 
look identical. Figure 6 is extracted from Henderson, Thomas and Nalepka, 
Reference [2]. The canopy model used was developed by G. Suits [3]. 
Roughly what seems to be occurring is that the crop starts its growth on 
the line of soils. As it grows, the composite reflectance of soil and 
crop increases the CH 3 value because of the presence of cellulose in the 
plant. The composite reflectance of CH 2 decreases because the 
Chlorophyll in the plants is highly absorbing. Hence, the radiance 
typical of green plants is located to the left at the tip of the triangle. 

Figure 6 attempts to span the range of soil conditions by the terms 
’’light” and "dark”. Is this all there is to soils as seen in Landsat 
data? Condit [4,5] has measured the spectral reflectance of soil samples 
throughout the United States, and analyzed them in terms of their 
principal spectral components, We have used Condit ’s data to calculate 
the soil distribution that would be seen through the Landsat MSS spectral 
filters, [11]. Table 4 shows the soil reflectance mean vector and prin- 
cipal components in Landsat data. We will summarize those results in 
the following discussion. 


44 



Z REFLECT Al^CE 750 ym 


B 


FORMERLY WILLOW RUN LABORATORIES, THE UNIVERSITY OF MICHIGAN 


lA DARK BARE SOIL 
IB light bare soil 

2( ) WHEAT AT EMERGENCE 
3( ) WHEAT AT INTERMEDIATE STAGE 
4A,B WHEAT AT MATURE STAGE 
5A,B WHEAT AT CHLOROTIC STAGE 
6 A, B WHEAT AT SENESCENT STAGE 
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FIGURE 6. phenology FOR WHEAT (IONIA VARIETY) 
BASED ON CANOPY MODEL [2] 
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TABLE 4. SUMMARY OF VARIOUS VECTORS AS SEEN BY LAND SAT 
PERCENT EFFECTIVE REFLECTANCE 



CHI 

CH2 

CH3 

CH4 

/A 


15.57 

21.83 

25.55 

31.14 

48.389 

Vl 

14.22 

17.36 

18.981 

20.23 

35 . 681 

Vz 

4.23 

.018 

- 2.03 

- 2.16 

5.165 

V3 

- .57 

2.076 

- 1.54 

1.30 

2.949 

V4 

- 1.35 

. 166 

.581 

- .1408 

1.486 

y is 
s 

the mean 

vector of 

soils 




V]^ through Vl^ are the principal components whose amplitudes 
are given as /A. 


Figure 7 gives an idea of the distribution of soil reflectance pro- 
jected into the -4 dimensions spanned 'by the 4 Landsat MSS Channels. That 
space has a "diagonal’' i.e,, a line along which the normalized reflectance 
of all channels is equal. The mean reflectance of soils lies near that 
diagonal. The largest principal component of soil reflectance is nearly 
parallel to the diagonal. The square root of the eigenvalue associated 
with the first component is about 35 units, (i,e., one standard deviation 
of the data projected onto the first principal component is about 35). 

The second principal component, normal to the first, has a standard devi- 
ation of about 5 units, the third of about 3 units, and the fourth of about 
lr^l/2 units, The unit contour ellipsoid describing the distribution of 
soils forms a four^dimensional flattened cigar shape, about seven times 
as long as it is wide, about twice as wide as it is thick, and twice as 
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thick as it is thin (which is the name for distance in the 4th direction) . 
Hence, for some applications, we would be justified in describing the data 
from soil points as the "line of soils," ignoring all but the major com- 
ponent . In other cases, we might speak of the "plane of soils," referring 
to the first and second component, 

Returning now to Figure 6, we notice again that after the initial 
development stages the two crop canopy trajectories join and fall back 
towards the soil line. What cannot be seen in this figure is that the 
line of falling back is not in the same plane (in the 4 space of Landsat 

data) as the two development lines up to the point where they join. The. crop 
is yellowing, and yellow'-colored things lie in a different direction away 
from the soil line t-han do green-colored things. 

We now have sufficient information to create the basic image of the 
tasselled cap, shown in Figure 8. 

The basic tasselled cap shown in Figure 8, is created by combining 
soil reflectance and green stuff and then adding yellow stuff. We say that 
the crop starts growing on the plane of soils. As it grows it progresses 
outward, roughly normal to the plane of soils, on a curving trajectory 
towards the region of green stuff. Next the trajectories fold over and 
converge on the region of yellow stuff. Finally, the crop progresses 
back to the soil from whence it came by any of several possible 

routes, depending on the crop and the harvesting practices. 

Initially, we spoke of a flattened triangle. Now, we are likening the 
data structure to a tasselled cap. To fit both of these images the yellow 
point must be quite close to the side of the cap and indeed that is true. 

For wheat, the yellow is also accompanied by shadowing so that the 
yellow point is found near the dark end of the plane of soils. 

The "front" of the cap looks down toward the origin of all data 
otherwise called THE ORIGIN, On the front of the cap is the badge of trees. 
Why the reflectance of trees is located just here will be explained a little 
further on, 
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Effects of. Shadow 

As the crop canopy develops away from the soil, the average reflect- 
ance becomes more green, but at the same time shadows develop. Initially, 
much of this shadow will appear on the soil portions of the composite 
canopy. Thus the reflectance of a crop planted on bright soil will initi- 
ally migrate mainly in the direction of the origin. 

A crop which is planted on dark soil will not show this behavior 
significantly. After all, there is little difference between the radi- 
ance of dark soils and the radiance of shadowed dark soil. 

Once maximum shadowing on the soil has been reached the reflectance 
is more strongly influenced by the addition of green elements to the canopy. 
Thus the trajectory of reflectance values sweeps away from the plane of 
soils. Initially many of the green elements that are added are shadowed 
green elements. Hence, the total reflectance remains low until most of the 
ground is covered. 

In the next stage the canopy looses most of its shadows, reaching a 
state of full green development. Whether a crop actually reaches this stage 
depends upon the planting density and upon the way its leaves form together 
to make a canopy. 

This curving trajectory has been documented by F. Johnson [6] In 
Fayette County corn field data, and also has been shown in the results of 
a detailed modeling exercise conducted in gther efforts under this 

contract .[7,8] . Interestingly, Johnson has found that corn planted in 
East-West rows does not show this behavior significantly, whilst corn 
planted in North-South rows does show a very strong shadow effect. The 
reason is .clear. At the time of the Landsat overpass, the Sun’s rays are 
coming mainly from the east. Sunlight falls down the East-West rows and 
shadows fall on the sides of other corn plants rather than in the open 
rows. 
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Now we can see why trees occupy the place they do in reflectance 
space. Trees are green canopies structured so as to create a good deal 
of shadow. 

A Fixed-Linear Transformation 

It is difficult to look at Landsat data and see all of the features 
so far described. After all, this is a 4 '-dimensional space we are looking 
at, and it is hard to be sure we are seeing everything. Therefore, we have 
developed some transformations of the data which assist us to see it better.* 
The only one of these we will discuss at this point is a fixed affine trans- 
formation, 

u = R^x + r ( 1 ) 

where 

X is the LANDSAT MSS signal vector expressed in counts 

u is the transformed vector, also expressed in counts 

r is an offset vector, introduced to avoid negative values in 
the transformed data 

R is a unitary matrix, i.e. , the columns of R are unit vectors R^ , 

R 2 , R 3 and R 4 , which are all orthogonal to each other. Superscript 
T indicates the transpose. Thus the application of the transforma- 
tion to the data x results in a pure rotation plus a pure transla- 
tion. 

The components of R are chosen in the following way: 

Rl is chosen to point along the major axis of soils in the 

Landsat data. A particular sample of Landsat data was chosen 
to derive Ri , namely Fayette County, Illinois, June 1973. 

Visual inspection of Figure 5(a) was used to pick out 12 soil 
line clusters. The best fit line to the means of those 12 clus- 
ters was chosen as the direction of Rj^. Ri is called the soil 


The transformations we have devloped have depended in part on the work 
of F. Johnson. 
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brightness unit vector. The projection of a data point onto 
Rl is a feature called brightness. 

R2 is chosen to point orthogonal to R]^ and toward a green cluster 
in the same data set. Visual inspection of Figure 5 (a) was used 
to identify the cluster. R2 was generated using the Gram- 
Schmidt orthogonalization procedure. R2 is the gi's-ien stuff 
unit vector. The projection of a data point onto R2 is a 
feature called "green stuff." 

R3 is chosen orthogonal to both R^ and R2, and points toward a 
yellow-stuff point. There was no yellow stuff in the Fayette 
segment, hence an approximate spectrum of yellow corn was 
used to simulate or predict the yellow point in the Fayette 
data. Again the Gram-Schmidt procedure was used to derive 
the yellow-stuff unit vector. 

R4 is chosen orthogonal to R^ , R2 and R3 . The projection of a 
data point onto R4 is a feature called "non-such." 

The values of Rj , R2, R3 and R4 are, to the third decimal place. 



The offset vector is arbitrary. All components equal to 32 seems to 
work well. 

The fixed-linear transformation has several potential uses. 

Simply by projecting the clustered data in terms of the features 
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of Eq. (1) we can see the data structure easily. We can also 
examine it to determine to what extent it actually behaves 
according to our imaginary picture. 

b. Potentially, there is significantly less information in some of 
the transformed channels than in others, whereas each of the 
original channels is about equally information carrying. Thus, 
one might be able to ignore certain of the tranformed channels 
and this could lead to cost reduction in processing. 

c. The transformation of the data allows certain diagnostic features 

to be extracted which are symptomatic of external effects, such 
as haze, vapor, illumination angle, and viewing angle. 

In order to picture the data resulting from the fixed- linear trans- 
formation we show Figure 9(a) through (d) , which are cluster plots of the 
data presented in the pairs of transformed channels. The data shown is 
from the Ellis County, Kansas ITS, dated 13 June 1973. Much of the scene 
is bare soil. (Recall that the transformation was developed on Illinois 
data.) Notice that transformed channel 1 (TCH 1), which is soil bright- 
ness, and TCH 2, green stuff, contain almost all of the variation within 
the sample segment. 

Figure 9(a) shows these two channels plotted against each other . The 
triangular shape is easily noted, now rotated to the right so that the 
soil line is parallel with the soil brightness axis. One noticeable effect 
of the transformation is to increase the apparent size of the tasselled 
cap, even though there was not any scale factor built in to the transforma- 
tion. The reason is that in the transformed data we are seeing the tas- 
selled cap directly from the side. 

Figure 9(b) shows the yellow feature plotted versus the green feature. 
Notice that the data is greatly compressed in the yellow direction. 



53 



FORMERLY WILLOW RUN LABORATORIES. THE UNIVERSITY OF MICHIGAN 


ELLIS 13J LINEAR 

( 3 ™ 




01 


(^ 0_3 








2p 


FORMERLY WILLOW RUN LABORATORIES, THE UNIVERSITY OF MICHIGAN 



ELLIS 13J LINEAR 


8 




8i_ H- 

.00 12.00 


-H 

24.00 


H 

36.00 


_l 1— — 

48.00 60.00 

CHANNEL I 


H 

72.00 


H 

64.00 


H 1 

96.00 108 
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Figure 9(c) shows non-such versus soil brightness. There is evi- 
dently no structure at all in the non-such direction. Figure 9(d) shows 
non-such versus yellow stuff. One could easily belive that these chan- 
nels together carry only a tiny fraction of the information available 
in Landsat data. However, yellow stuff does show definite spatial struc- 
ture at some times, as we will see later. 

A second method of presentation of transformed data is by viewing 
transformed imagery. Figure 10 shows green stuff images of a LACIE 
sample segment in Kansas during 4 successive plant development stages./' 

The region at the top and bottom of the segment contains numerous winter 
wheat fields. The region at rhe center is rangeland. Figure 11 is the 
soil brightness image of the same data. Figure 12(b) shows non-such in 
the 4th biophase and is reasonably typical of non-such and yellow stuff 
in all of the biophases, i.e., mainly noise with almost no discernible 
structure. Figure 12(a) is yellow stuff in the 4th biophase. Although 
the dynamic range of the data is only about 10 counts, which is compara- 
ble to Figure 12(b), the strong spatial structure is evident. 

Returning to Figure 10(a), we note the rangeland is somewhat green, 
but the fields are not green at all. The roads show up, if at all, as 
slightly green, due to the grass on the roadside. In Figure 10(b), the 
2nd biophase, the fields show up strongly green, while the rangeland is 
still only somewhat green. In Figure 10(c), the 3rd biophase, both range- 
land and winter wheat are green; one can imagine that the rangeland had 
caught up with the wheat. Finally, in Figure 10(d), the 4th biophase, 
the wheat is again not green. These trends are substantially what one 
would expect. 

Returning to Figure 11(a), we see the soil brightness during the 
first biophase. One striking effect is the way the roads stand out in this 
image. Notice that the wheat fields are generally, but not entirely, dark. 

Q. Holmes, NASA/ JSC, transformed the data used in this and the next 
example, and created the imagery we have used in Figures 10 through 12. 
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Biophase 2, Greenstuff, SS No. 1172 


(a) Biophase 1, Greenstuff, SS No. 1172 


(d) Biophase 4, Greenstuff, SS No. 1172 


Biophase 3, Greenstuff, SS No. 1172 


FIGURE 10. TIME PROGRESSION OF GREENSTUFF FEATURE 
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Basically these are bare soil fields and we could expect some to be light 
and some dark. 

In Figure 11(b), the wheat fields are dark; we interpret this to mean 
that the fields have de\^eloped shadow in them in the process of growing. 
The rangeland is substantially unchanged between biophases 1 and 2, in the 
soil brightness feature. The roads are still bright. 

In Figure 11(c), the 3rd biophase, the wheat fields have brightened 
up, as has the rangeland. There is little contrast between the two. 

In Figure 11(d), the 4th biophase, some of the wheat fields are 
bright, others are not. We interpret this to mean that some are harvested 
(no shadows) and others are not harvested. Notice that all of the areas 
that appear to be wheat are yellow in the 4th biophase, but only some are 
bright (see Figure 12(a)). In the 4th biophase the rangeland is again 
moderately bright. The roads stand out by bright contrast. 

Earlier, we commented that there was more than two channels' worth 
of information contained in Landsat data. Here we have shown an example. 
The green feature, the yellow feature and the brightness feature are 
three independent measurements. They could not all be measured and repre- 
sented by a two-channel Landsat. 

A third method of viewing transformed data is by looking at tables of 
cluster statistics or training statistics. This approach is utilized in 
the discussion in the following section. 

The Problem of Correction for External Effects 

We have discussed the Tasselled Cap as a way of integrating the 
spectral reflectance structure of a Landsat MSS agricultural scene. The 
reflectance has, for some specified conditions of observation, a corres- 
ponding radiance and a corresponding representation in Landsat counts. 

As the conditions of observation change, however, the relationship between 
reflectance and Landsat counts changes . By observation conditions we 
mean such items as the viewing and illumination geometry, the amount of 
haze in the atmosphere, the amount of H 2 O vapor the amount of cirrus 
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cloud and the height distribution of these in the atmosphere; also the 
average ground albedo in the neighborhood of the particular observed 
points. 

Some combination of these effects is without doubt extremely signifi- 
cant to the problem of identifying field types in Landsat data. The very 
fact that the data within a local area is confined to an extremely flat- 
tened structure within the Landsat signal space makes it easier in a certain 
sense to make errors in classification. Figure 13 shows a hypothetical 
two-channel example in which some external effect can shift the entire 
set of data points sideways. Two crops, W and V, occupy a narrow region 
of the space, and are easily separable in that region. Assume that we 
train a classifier on the data from one sample segment, obtaining the 
signatures W and V. Then» assume that the conditions change and the entire 
region shifts to the position represented by W' and V' . Classification 
errors will now occur, but more than that, the region occupied by the ori- 
ginal set of data points will not even include the new set of shifted data 
points. Figure 13 represents in an exaggerated way what really occurs 
due to the addition of haze to the atmosphere over a scene. The equiva- 
lent occurrence in the four-dimensional case of Landsat data would con- 
sist of a shift of the entire tasselled cap in the yellow stuff or non- 
such direction. Such shifts, ranging up to several standard deviations 
of the yellow stuff channel have been observed in randomly selected LACIE 
sample segments in Kansas (where standard deviation refers to the thick- 
ness of the entire tasselled cap in the yellow stuff direction). 

Table 5 is a list of means and standard deviations in the yellow stuff 
channel and the non-such channel from several LACIE sample segments ran- 
domly selected from Kansas. These are calculated by combining clusters 
of both wheat and non-wheat fields. The cluster means were averaged to 
form, y, a grand mean in each transformed channel and the between cluster 
variance component was added to the average cluster variance to obtain 
2 •' 

O' . In calculating the statistics, certain clusters were identified as 
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FIGURE 13 . HYPOTHETICAL EXAMPLE OF HAZE EFFECT 
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TABLE 

5. SOME 

EXAMPLES OF YELLOW AND NOM-SUCH 

VARIATION 

FROM 



SITE TO SITE AND 

TIME TO TIME 





Yellow- 

Stuff 

Non-Such 

Bite 

Time 

y 

°TOT 

y 

°T0T 

1163 

1* 

22.12 

2.08 

33.41 

1.05 


2 

24.87 

1.33 

29.71 

1.48 


3 ■ 

24 . 04 

1.61 

28.46 

2.08 


4 

18.46 

1.74 

29.10 

1.43 

1172 

1 

26.12 

1.21 

33.23 

1.06 


2 

26.24 

2.01 

28.69 

1.33 


3 

24.61 

1.74 

28.49 

1.59 


4 

28.19 

2.25 

27.93 

1.71 

1854 

1 

24 . 68 

1.42 

33.50 

1.19 


2 

23.42 

1.79 

29.95 

1.36 


3 

25.84 

1.38 

30.59 

1.48 


4 

24.77 

1.60 

28.80 

1.29 

1875 

1* 

24.37 

1.41 

33.18 

1.01 


2 

24.51 

1 . 51 

29.07 

1.62 


3* 

20.39 

2.39 

29.25 

1.44 


4 

26.04 

2.15 

27.54 

2.07 

1865 

1 

25.84 

1.41 

33 . 10 

1.72 


2 

23.47 

1.84 

29.57 

1.57 


3 

26.47 

2.32 

27.50 

1.60 


4 

25.96 

2.29 

27.10 

1.53 


* Denotes that clouds were detected and the cloud clusters were omitted 
from the calculations. 
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clouds in certain passes and these were deleted from the average. The cloud 
clusters were detected on the basis of an algorithm developed for the PRO- 
CAMS effort [9] which takes advantage of a cloud's high reflectance in the 
bright stuff feature space and low reflectance in the yellow stuff feature 
space. Clouds (which may be likened to dense haze) are the darkest objects 
in the yellow stuff dimension. The densest clouds have a value of about 
zero on the yellow stuff scale. Light clouds, just discernable in imagery , 
have yellow stuff values in the range of 9 to 15. One particular image, 
sample segment 1875, pass number 3, has a gradual change from no discern- 
ible cloud at the South edge to fairly dense cloud at the North edge. 

The clusters deleted were all from the Northern half. Still, there is a 
reduced mean value (20.39) and an increased variance for that entry in 
the table. 

We cannot discern any site-to-site or time-to-time pattern (hence, 
site or time dependency) in the mean values of yellow stuff. The overall 
standard deviation of the mean values is 2.1, whereas the average within- 
site standard deviation is also in the neighborhood of 2.0. Thus, there 
appears to be a reasonable prospect of using this feature, uncorrected, 
to estimate the amount or level of haze at a site. A joint use of this 
feature with other features for haze correction is being developed and is 
reported [10]. 

Considering non-such, the only pattern we can observe in the table 
is that the first pass at each site is systematically higher than the aver- 
age by several counts. The sun elevation angle for these passes is about 
45°, whereas for all others it is about 60°. Coincidently , 1 of the 

first passes are Landsat-1 data, while all others are Landsat-2 data. 
Theoretically, the non-such channel is dominated by the difference be- 
tween channels 3 and 4, and ought to be influenced by the amount of water 
vapor present in the atmosphere, but we have no empirical evidence of that 
as yet . 

Figure 13 also shows a shift in the brightness direction. In the 
real case, a negative shift in the yellow-stuff direction due to haze is 
also accompanied by a positive shift in brightness a-pd a negative shift 
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in greenness, as well as a general contraction in scale (i.e., loss of 
contrast) . The interactions are complicated and outside the scope of this 
report. The key point is that the yellow shift and the non-such seem to be 
diagnostic of a physical state of the atmosphere. We are attempting to 
exploit these diagnostic features for purposes of correcting the data 
for the effects of haze and viewing angle under another task of this 
contract, namely, signature extension. [9>10] 

Imagine that a cloud represents an extreme case of haze. Then to 
recapitulate, a cloud would appear extremely shifted in both the negative 
yellow stuff and the positive brightness direction. We are continuing 
to experiment with a cloud deii'ctor based on this idea, i.e., if the quan-- 
tity passes a certain threshold (145 works well as long as all com- 

ponents of the offset vector, r; are the same), a pixel is labeled cloud 
(u^ and are the brightness and yellow stuff components of the vector u 

shown in Eq. (1).) 

Point of All Shadow 

AI’s do not perceive the amplitude in each Landsat signal channel. 
Instead, they perceive preprocessed features of color and brightness. By 
selectively ignoring brightness they are able to concentrate on the aspect 
of wheat growth stage. They thus^ can Ignore such confusing items as soil 
brightness and residual shadowing effects of view angle and illumination 
angle. It would seem wise to carry out the same kind of transformation 
on Landsat data, and express the Landsat signal in terms of "brightness", 
".green", and "yellow" developments, and "non-such" for purposes of develop- 
ing a computer signature for wheat. (This is a nonlinear correction as 
opposed to the linear transformation described above.) 

In principle, there exists a point, located somewhere back of the ori- 
gin, called the point of all shadow. As we change viewing angle and illu- 
mination angle, the amount of shadow which can be seen in the canopy var- 
ies . The reflectance of the canopy therefore changes, becoming lighter 
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or darker. However, the changes are not merely towards or away from the 
origin, as they would be if only a change in illumination level were in- 
volved; there is also a color shift, since the radiation reflected from 
within the shadow region is more strongly colored than that reflected from 
the unshadowed region. 

By making a shift of coordinates to the point of all shadow, the data 
can be treated as though the changes in illumination and viewing angles 
did not induce any color shift, but only a brightness change. 

The key idea about the point of all shadow is that all points lying 
on any radius from this point are at the same stage of crop development. 
This is no doubt not perfectly true — it is in fact only an idea. But a 
slightly simpler version of this same idea forms the basis for the red 
to infrared ratio (i.e., CH 3 divided by GH 2) as a measure of green 
biomass. [8] 

To be specific, we propose to use a transformation of the form 


V = 

s 


( 2 ) 


where 


is the Landsat signal- vector after haze correction 
is the point of all shadow 

is a brightness feature measured in the direction of soil bright- 
ness variation. 


,g., I = 


i=l 


Q is a dimension-reducing matrix such that 
/vA /green-color feature \ 


V 


yyellow-color feature^ 
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Thus, three features would be retained for processing, — s and the 
two components of v. 


In order to use this idea we have to pick a point of all shadow to 
work with. Several comments are in order. 

a. The point of all shadow should be chosen on the extended line 

of soils, even if that is not truly on the reflectance diagonal. 
In this way, natural soil brightness variations will be lumped 
together with shadow variation. 

b. We can pick a working shadow point and use it. If we have some 
success, then systematic efforts should be made to establish its 
position more accuractely. 

c. The point of all shadow will be modified by atmosphere (haze) 
effects in the same way that any other point in the reflectance 
space will be. Any transformed features which utilize the point 
of all shadow as an origin will be dependent upon the haze level. 
Therefore, it will be necessary to carry out a correction for 
haze in order to properly exploit the color feature representa- 
tion. 

4.2.2 THE ESTABLISHMENT OF SIGNATURES 

In order for the computer to make the step of suggesting field ident- 
ifications to the AI, the computer must have a spectral-temporal model 
for field types available to it. A number of methods are available for 
creating such a model. 

a. Baseline LAC IE Signature Extension Approach 

In this approach signatures from one or a few sample segments 
are applied to the segment being worked, after an appropriate 
correction for external effects such as haze or viewing angle 
has been made. 

The main difficulty with this approach is that the sampling vari- 
ance of sample segments is large. This is demonstrated in Fig- 
ure 14 which shows a collection of cluster plots from sample 
segments in Kansas. (In these figures Landsat MSS CH2 and CH3 
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FIGURE 14. 




COMPARISON OF CLUSTER PLOTS (CHANNEL 2 VS 
FOR 8 KANSAS SITES 4 (The solid lines are 
based^ on -thfe Finney Site) „ 
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are shown reversed from Figure 4, so that the green devel- 
opment region is on the lower right of the graphs) . 

All of these passes were nominally the same biophase but are 
not necessarily in the same partition with respect to details 
of available moisture, temperature history, etc. 

Attempting to extend signatures from any one sample segment to 
any other is equivalent to training AI’s on one sample segment 
only. However, the training from one sample segment which con- 
tains a wide variety of crop types might be used for a "starter" 
model. 

b. A second possible approach to building up a computer signature 
for wheat is to use an analytical model for the wheat spectral 
signature as a function of crop calendar. The model results 
should be checked against ground measurements and available 
Landsat measurements. A simulation of wheat signatures has 
been exercised and is reported under another task on this 
contract. [7] 

c. A third approach, suggested by M. Trichel of NASA/ JSC, involves 
the 3- and 4-pass combinations from the 1975 LACIE data base. 
Each pass would be labeled with one of 16 mini-biowindows, 
based on the crop calendar for the site, and each set of four 
passes over a site would be regarded as a 16 component (i.e., 

4 passes X 4 channels) partial sample of a 64 component (i.e., 

16 passes X 4 channels) vector. Data from the labeled wheat 
fields (from AI interpretations) would be used to estimate a 
16-pass wheat signature, following the research results of 

T. Bouillon. [16] 

d. A fourth approach would be to use the delta classifier [12] to 
classify the blob means. (If this were employed, the haze cor- 
rection and the nonlinear correction shown in Figure 3 would 
not be used.) 
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It is important to note that the systen: configuration shown in Fig- 
ure i(e) can be a viable one even when the quality of the computer's iden- 
tification is rather poor. The AI has opportunity to disagree with the 
computer designation. The computer has opportunity to learn from these 
disagreements. Once the system is set up and operating in a local pro- 
portion estimation mode, and the AI resources required are being reduced 
by the computer assistance in organizing the data in the sample segment, 
then formal provision can be made for the learning process to occur again 
using results of on-going research in the SR&T community. As the com- 
puter becomes more expert at identifying blobs and clusters, the AI re- 
sources required are further reduced. 

^^atever the formal learning process might be, it will be required 
to provide the computer with the same information that the AI has access 
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5 

CONCLUSIONS AND RECOMMENDATIONS 

The net result of this work to date has been the creation of a 
conceptual man-machine system framework for a large scale agricultural 
remote sensing system and the generation of some specific elements of 
that system. The system is based on and can grow out of the local re- 
cognition mode of LACIE, through a gradual transition wherein computer 
support functions supplment and replace AI functions . 

Local proportion estimation functions are broken into two broad 
classes: organization of the data within the sample segment using spec- 

tral and spatial information within the sample segment; and identifica- 
tion of the fields or groups of fields in the sample segment. A set of 
computer programs have been implemented which assist the AI in organiz- 
ing the sample segment, accept the AI’s identification of fields, and 
tally the resulting classifications to produce a proportion estimate. 

The central computer programs which accomplish these results are modifi- 
cations of existing ERIM programs. Thus, spectral-spatial clustering 
is accomplished by modifying an existing clustering algorithm to accept 
line and point numbers as channels. Blob clustering is accomplished by 
modifying the same program to accept and cluster the means of blob rather 
than pixel values. Another existing program was used to strip the boun- 
dary pixels from blobs. A special program had to be written to organ- 
ize the blob tables. An existing program was used to tally the area of 
wheat inside of a sample segment, given the blob or cluster identifica- 
tions. 

A few examples of the exercise of this function have been produced, 
and await critical evaluation. 

The structure of Landsat data has been explored with the objective 
of establishing a conceptual basis for computer identification of crops . 

A heuristic view of the spectral-temporal structure of Landsat data is 
described in the Tasaelled Cap. This heuristic idea is used to describe 
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a system incorporating several interlinked steps, — cloud removal, haze 
correction, nonlinear feature extraction and classification according 
to a signature model, — in support of local field identification by AI 
and by computer. Possible methods of obtaining a signature model for 
wheat are outlined. 

With respect to the first broad function, sample segment organiza- 
tion and proportion estimation, there are numerous possible variations 
on the theme. We have implemented one of them and it is well enough 
defined to be implemented elsewhere as a research tool. The most immedi- 
ate need is to allow working AI's to obtain some experience with this 
tool. This might be accomplished by modifying existing utility programs 
at Johnson Space Center and writing others to accomplish these functions. 
To be implemented for formal test and evaluation for possible inclusion 
in LACIE, the approach would have to be specified at a greater level of 
detail than given in this report. However, the specfication would not 
require an inordinate additional effort. 

With respect to the second broad function, field identification, 
many degrees of implementation are possible. All demand some measure 
of correction for external effects and this is in the process of being 
implemented. The implementation of a starter signature model which can 
be used to tentatively identify blobs is a short next step. However, 
the institution of an Al-computer interactive loop in which the signa- 
ture is gradually improved due to the recorded corrections froin the AI 
is a longer lead time function. We recommend that research on this topic 
be undertaken in the SR&T program soon. The analysis should include 
ancillary data available both to AI and to the computer, especially 
parameters relating to crop calendar, as part of the definition of signa- 
.ture. ■ . 
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APPENDIX I 
CROP CODES 

Code Crop 



100 



Spring 

Wheat (General) 



101 

— -s-126 


Spring 

Wheat Varieties 



200 



Barley 

(General) 



201 

^>216 


Barley Varieties 



300 



Oats (General) 



301 

>307 


Oat Varieties 



400 



Winter 

ITheat (General) 



401 

»424 


Winter 

Wheat Varieties 



500 



Grasse: 

s/Pasture 



600 



Other 1 

Crops (General) 



601 

— ->618 


Other ' 

Crops (Specific) 


■ , 

700 



Summer 

Fallow 



800 



Non-Agricultural 


Field No. 

ID 


Field No . 

ID 

Field No. 

ID 

1 

700 


13 

400 

26 

700 

2 

700 


14 

700 

27 

700 

3 

404 


15 

402 

28 

500 

4 

402 


16 

400 

29 

400 

5 

700 


17 

400 

30 

700 

7 

402 


18 

500 

31 

400 

8 

700 


19 

700 

32 

402 

9 

402 


20 

616 

,,.:33 

700 

10 

700 


21 

616 

34 

404 

11 

700 


23 

700 

7 : 35 ;■/. 

700 

12 

400 


25 

404 

36 

616 
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Field No. 


Field No . 


Field No. 

ID 

37 

408 

82 

607 

123 

500 

39 

404 

83 

616 

124 

500 

40 

400 

85 

700 

125 

500 

41 

402 

86 

700 

126 

402 

42 

700 

87 

402 

127 

700 

43 

400 

88 

607 

128 

700 

44 

400 

89 

408 

129 

402 

46 

400 

91 

40S 

130 

607 

47 

700 

93 

404 

131 

402 

48 

616 

94 

404 

134 

408 

49 

408 

97 

606 

135 

607 

51 

500 

98 

700 

137 

700 

52 

400 

99 

408 

139 

402 

53 

500 

102 

700 

140 

700 

54 , 

400 

103 

616 

141 

700 

55 

407 

106 

616 V 
% * 

144 

400 

56 

404 

106 

607 > 

147 

402 

58 

700 

108 

402 

148 

500 

59 

500 

110 

402 

149 

500 

60 

616 

111 

607 

150 

409 

61 

607 

113 

409 

151 

607 

62 

616 

114 

700 

152.,^,. . 

408 

63 

402 

115 

402 

153 

402 

64 

700 

116 

700 

154 

607 

66 

402 

117 

402 

155 

408 

73 

700 

118 

700 

158 

616 

74 

700 

119 

607 

159 

700 

75 

700 

120 

616 

160 

700 

76 

700 

121 

607 

161 

700 

77 

404 

122 

700 

163 

700 

* This is 

the way it 

is shown in our 

ground information. 
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Field No. 

ID 

Field No . 

ID 

164 

500 

209 

616 

166 

700 

210 

700 

167 

400 ■ 

211 

600 

168 

607 

214 

602 

169 

607 

215 . 

602 

170 

607 

217 

500 

171 

402 



173 

700 



174 

607 



175 

700 



176 

700 



177 

400 



179 

607 



180 

607 



18 Z 

700 



185 

400 



186 

400 



187 

400 



188 

607 



189 

607 



192 

607 



193 

607 



194 

700 



196 

402 



198 

402 



200 

400 



201 

607 



206 

607 



207 

700 



208 

616 




81 



